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=) An abbreviated version of this protocol was published in Science in Oct 2022 
A bacterial phospholipid phosphatase inhibits host pyroptosis by hijacking ubiquitin 


DOT: 10.1126/science.abq0132 


Detailed protocol 


Prediction of eukaryotic-like proteins in Mtb 
1. The following databases were used: 
Effective DB (http://effectivedb.org/) 


UniProt (https:/Avww.uniprot.org/) 
SMART (http://smart.embl-heidelberg.de/) 


2. The following software tool was used: 
Python (https://repo.anaconda.com/archive/Anaconda3-4.2.0-Windows-x86_64.exe) 


3. The following code was used: 
SMART.txt (See the attached file "SMART.txt") 


4. Procedure: 
1) Download FASTA file which contains all sequences of Mtb-encoded proteins from UniProt database 
(https:/Awww.uniprot.org/proteomes/UP000001584/), and name it as "H37rv.fasta". 


UniProt. BLAST Align Peptidesearch IDmapping SPARQL  Proteomes 


Proteomes : Mycobacterium tuberculosis (strain ATCC 25618 / H37Rv) 


Overview 
Status f Reference proteome Genome representation Full 
Protein count! 3,995 Pan proteome This proteome is part of the Mycobacterium tuberculosis (strain ATCC 25618 / 
Genecount 3,995 Download one protein sequence per gene (FASTA) H37Rv) pan proteome (FASTA) 
Proteome ID UP000001584 Completeness (CPD)* Standard 
Taxonomy Mycobacterium tuberculosis (strain ATCC 25618 / H37Rv) BUSCO! single Mbuplicated Fragmented Ill Missing 
Strain ATCC 25618 / H37Rv 
Genome assembly and annotation’ GCA_000195955.2 from ENA/EMBL as eens eee 


€:99.15 (5:98.55 D:0.5%) F:0.4% M:0.5% 


Acid-fast, obligate aerobic, non-motile, rod-shaped bacterium, this is the causative agent of tuberculosis. Tuberculosis is, to this day, according to the WHO, the leading killer of adults, with approximately 2 
million deaths annually worldwide. It is estimated that 8 million people are infected each year. A large part of its success as a pathogen is due to its ability to persist in a dormant or latent form for years or even 
decades, with a concomitant absence of clinical symptoms. This non-replicating persistent form is refractory to most drugs, it may be induced by hypoxia (oxygen depletion) and/or nitric oxide exposure. Up to 
one-third of the world’s population is thought to be latently infected. An additional problem is the emergence of drug resistant strains, mainly because people do not complete their treatment plans or have 
been incorrectly treated and so have remained infectious, Mycobacteria have an unusual outer membrane approximately 8nm thick, despite being considered Gram-positive. The outer membrane and the 
mycolic acid-arabinoglactan-peptidoglycan polymer form the cell wall, which constitutes an efficient permeability barrier in conjunction with the cell inner membrane. 


Components 
ywnload § View proteins 


% Component name Genome accession(s) Protein count 


Chromosome AL123456 (2 3,995 


2) Download all eukaryotic-like domains (eld_search.csv) from Effective DB (hittps://effectivedb.org/reports/eld_search? 
minscore=4&maxscore=10000&maxavg=0.5) 


University of Vienna|Research network CMM|Dept of Microbiology and Ecosystem Science|CUBE VOGDB SIMAP Gepard GenSkew ConsPred PICA 


° m= 
EffectiveDB = sear : 


Prediction of bacterial protein secretion 


Browse Download Publications 


SEARCH EUKARYOTIC-LIKE DOMAINS 


RELEASE ANNOUNCEMENTS 
SEARCH PARAMETERS 
Score range 
4 - 10000 
Maximal mean copy number in non-pathogens 
0.5 NEWS 


Terms in domain ID and description ) 


SEARCH RESULTS sv 


Doman Mean copy Stddev of copy Best 
ID Description number in non- number in non- ELD 
pathogens pathogens score 
RTA1 like protein 0.00 0.00 10000 
RNA polymerase Rpb2, domain 5 0.00 0.00 10000 


3) Install Selenium WebDriver (https://anaconda.org/conda-forge/selenium) in Python. 


conda i 


4) Download SMART.txt and put it in the same directory as H37rv.fasta, and then rename "SMART.txt" as "SMART.py" to run it by python. A file named 
as "SMART.csv" containing all Mtb-encoded proteins with annotation of functional domains will be generated after ~1-week running. (NOTE: ensure the 
internet connection is stable and do NOT interrupt the internet connection.) 


S Spyder (Python 3.9) 


Edit Search Source Run Debug Consoles Projects Tools View Help 


[= (el I KF BW Mnstitute 


riun_tub 


SMART1. py x 


if i[2] is not None: 
f.writelines(i[2].to_string()+ 


f.writelines(i[2]) 
f.writelines("\n") 
LRGIAEVTATA.. 


m bs4 import BeautifulSoup OT 
t re, itertools, requests ,time ead cite i 
Name, Start, 
content = [] 
vith open( 
for line in f.readlines() 
f line.startswith("= ie 
name = line.strip( ).strip("=") 
items 
content. append(items) Help Variable Explorer 
elif _re.match(r"*\d+.*",line): 


Name, Start, 


DataFrame Reason 


list 


Console 2/A «x 


= BeautifulSoup(requests runcell(0, 
Bone al: ; l Rv146. 1TB_UBA/t 


timeout=15) .content, i 

xcept requests.exceptions.ConnectionError: 
time.sleep 

it=1 
2") [2] ,name] 
x:x.strip( ), line.strip("\n") 

item.extend(list(filter(la x:x!="",text))) 

items .append(item) 


data = pd.DataFrame(list(itertools.chain.from_iterable(content) )) 


data.to_csv("s csv") 


IPython 


Mem 42% 


5) In SMART.csv, Mtb-encoded proteins annotated with a PFAM ID that indicates a eukaryotic-like domain as listed in eld_search.csv were selected and 
considered as a eukaryotic-like protein. 


Determination of subcellular localization of Mtb eukaryotic-like proteins 
The subcellular localization of each Mtb protein was annotated according to the published articles (see the attached file “Refs. docx’). 


Circular representation of Mtb H37Rv genome using Circos 
1. The followina databases were used: 


Mose ee eS Ses tsss SiS, Sess: 


NCBI (https:/Avww.ncbi.nim.nih.gov/) 
PANTHER (hitp:/Awww.pantherdb.org/) 
Mycobrowser (httos://mycobrowser.epfi.ch/) 


2. The following software tool was used: 
Circos (http://circos.ca/software/download/circos/) 


3. The source code used for Circos plot is available on Zenodo: 
Funtion annotation of MTB genome (https://doi.org/10.528 1/zenodo.7021021) 


mf main. conf 


4. Procedure: 

1) Install the Circos (http://circos.ca/software/download/circos/). 

2) Download all files from Zenodo (https://doi.org/10.5281/zenodo.7021021). [NOTE: these files include MTB.fas which contains entire genome sequence 
of Mtb H37Rv obtained from NCBI (RefSeq: NC_000962.3) and MTB_protein_class.xlsx which contains information of Mtb H37Rv genome location 
according to Mycobrowser database and functional classification of Mtb protein-encoding genes based on PANTHER database. ] 

3) Run "circus -conf main.conf" in Shell or CMD. 


4) Get the output files "circos.svg" and "circos.png". 


Related files 


Refs.docx 


© 


SMART. txt © 
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